-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Save models to hdf5 and other formats #160
Conversation
MAINT Include test folder in wheels
MAINT More fixes to include test/ folder in sdist
Hello @kushalkolar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-01-20 21:23:44 UTC |
Hi @kushalkolar Thank you very much for your suggestion. |
I think it would be useful for any tslearn model where it makes sense. Using the structure shown here for KShape, I can create a base class that uses a pickling is the only answer I could find for how sklearn does it (and that's what their docs say too https://scikit-learn.org/stable/modules/model_persistence.html). Though there are issues & limitations with pickling of course (some are outlined in their docs). |
Yes, it could be a great idea to have a base class for serializable models so that code does not have to be duplicated everywhere! |
Ideally, this could be a more general solution than #57 for example. |
Sorry I haven't had time to work on implementing this yet. Yes I will definitely try to make a generalized solution to this. |
@kushalkolar : note that you still have issues with tests:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few changes still needed to make it compatible on all supported versions, but it's almost done!
There are some docstring errors for TimeSeriesMinMax
but I suspect that it comes #177
tslearn/hdftools.py
Outdated
# h5file[path + key].attrs['dtype'] = item.dtype.str | ||
|
||
# single pieces of data | ||
elif isinstance(item, (str, bytes, int, float, np.int, np.int8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use parent classes if you don't want to add them all manually (e.g. replacing np.int, ..., np.int64 with np.integer), but there are some differences (for instance unsigned integers):
https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html
@rtavenar Do you know why the CI is displayed as successful, although it is not? #177 was merged although there are some errors... |
@johannfaouzi Yep, I noticed that this morning and updated |
2. Organize test_serialize_models to have a generalized params_predict_check function.
_X_fit so that _X_fit doesn't have to be carried over when saving the model
renamed some things in test_serialize_models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work!
I have few minor comments, and the tests on Travis CI seem to pass, so we could target a merge pretty soon.
Also, once the changes are made, could you update CHANGELOG.md
to document that your new functionality has been added to tslearn
since version 0.3.0?
Thanks,
Romain
PS: let me know when you are done with integrating new models to the list
def test_global_alignment_kernel_kmeans(): | ||
seed = 0 | ||
numpy.random.seed(seed) | ||
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, it would be better practice to rely on fully synthetic data (generated from numpy.random.rand
for example, with adequate dimensions) so that you fully control the shapes and do not rely on other parts of the code for the test.
Same comment holds for tests related to all estimators.
Thanks for the help! Almost finished with this, a few questions: If I understand
Now Continuing to add this to all the models right now. |
Yes, for these two models, the training data needs to be known. But your point is important: could you, for these models, have a |
serializability 2. small changes for _check_not_fitted and better formatting of _check_params_predict
use for models that don't inherit from BaseEstimator
I'm having a weird issue with
I'm not able to reproduce this with pytest on my local machine. edit: I also changed |
Hi, The thing is that if self.metric in VARIABLE_LENGTH_METRICS:
check_is_fitted(self, '_ts_fit')
else:
check_is_fitted(self, '_X_fit') Similarly, the list of attributes to be saved should depend on the same condition |
I can't reproduce this either locally. But looking at the error, we see that the value of the but for the Edit: Nevermind, but it's changed again here: That's a bit messy... usually the values provided when the instance is created should never be changed, and other (private) attributes should be used. |
Well I'm confused, the loaded estimator should not have the |
Can't promise it will work but you can try to change the code so that However, this is only for Classifier and Regressor, which does not seem to be concerned with the current failing test... Though I don't get why the error is raised from either Classifier and Regressor although KneighborsTimeSeries is the only estimator used. Because |
Maybe a hint about where this issue comes from: It looks like the executed code comes from the |
@johannfaouzi Thanks! I tried a git rebase and it got quite messy. I think I'll just move my changes to a new PR that branches from |
Closing because #183 |
Hi,
I thought it would be useful to save the KShape model without pickling. I implemented a simple to_hdf5() method for saving a KShape model to an hdf5 file and from_hdf5() for reloading it so that predictions can be done with the model.
Changes to the KShape class:
An hdftools module is added to handle saving a dict of numpy arrays to an hdf file.
Usage: